Experience with a Distributed File System Implementation
نویسندگان
چکیده
This paper highlights some of the lessons learned during the course of implementing xFS, a fully distributed le system. xFS is an interesting case study for two reasons. First, xFS's serverless architecture leads to more complex distributed programming issues than are faced by traditional client-server operating system services. Second, xFS implements a complex, multithreaded service that is tightly coupled with the underlying operating system. This combination turned out to be quite challenging. On one hand, the complexity of the system forced us to turn to distributed programming tools based on formal methods to verify the correctness of our distributed algorithms; on the other hand the complex interactions with the operating system on individual nodes violated some of the tools' assumptions, making it di cult to use them in this environment. Furthermore, the xFS system tested the limits of abstractions such as threads, RPC, and vnodes that have traditionally been used in building distributed le systems. Based on our experience, we suggest several strategies that should be followed by those wishing to build distributed operating systems services, and we also indicate several areas where programming tools and operating system abstractions might be improved.
منابع مشابه
Experience Using a Globally Shared State Abstraction to Support Distributed Applications
In this paper, we evaluate the effectiveness of basing distributed systems on a persistent globally shared address space abstraction, as implemented by Khazana. Khazana provides shared state management services to distributed application developers, including consistent caching, automated replication and migration of data, location management, access control, and (limited) fault tolerance. We r...
متن کاملAvailabilitfin the Sprite Distributed File System
In the Sprite environment, tolerating faults means recovering from them quickly. Our position is that performance and availability m-e the desired features of the typical locally-distributed office/engineering environment, and that very fast server recovery is the most cost-effective way 0f providing such availability. Mechanisms used for reliabiliv/can be inappropriate in systems with the prim...
متن کاملExperience with a Language for Writing Coherence Protocols
In this paper we describe our experience with Teapot [7], a domain-specific language for addressing the cache coherence problem. The cache coherence problem arises when parallel and distributed computing systems make local replicas of shared data for reasons of scalability and performance. In both distributed shared memory systems and distributed file systems, a coherence protocol maintains agr...
متن کاملLong Term Distributed File Reference Tracing: Implementation and Experience
DFSTruce is a system to collect and analyze long-term file reference data in a distributed UNIX workstation environment. The design of DFSTrace is unique in that it pays particular attention to the efficiency, extensibility and the logistics of long-term trace data collection in a distributed environment. The components of DFSTrace are a set of kernel hooks, a kernel buffer mechanism, a data ex...
متن کاملsrfs kernel module
A distributed file system should appear to the user as a traditional file system. The user can create files, delete them, open and read files, write data etc. . . all without need to concern herself with the underlining implementation. Under the hood, the distributed file system usually resides on several servers, which provide load balancing and fault tolerance. There two extremes of distribut...
متن کامل